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ABSTRACT 

Based on the Sloan Digital Sky Survey Data Release 5 Galaxy Sample, we explore 
photometric morphology classification and redshift estimation of galaxies using pho- 
tometric data and known spectroscopic redshifts. An unsupervised method, k-means 
algorithm, is used to separate the whole galaxy sample into early- and late-type galax- 
ies. Then we investigate the photometric redshift measurement with different input 
patterns by means of artificial neural networks (ANNs) for the total sample and the two 
subsamples. The experimental result indicates that ANNs show better performance 
when the more parameters are applied in the training set, and the mixed accuracy 
CTmix = \/CTcariy^ + ciato^ of photomctric rcdshift estimation for the two subsets is su- 
perior to (Tz for the overall sample alone. For the optimal result, the rms deviation 
of photometric redshifts for the mixed sample amounts to 0.0192, that for the overall 
sample is 0.0196, meanwhile, that for early- and late-type galaxies adds up to 0.0164 
and 0.0217, respectively. 

Key words: catalogs - galaxies: distances and redshifts - galaxies: general - galaxies: 
photometry - surveys - techniques: photometric 



■ 1 INTRODUCTION 

The Sloan Digital Sky Survey (SDSS, York et al. 2000) is an 
I astronomical survey project, which covers more than a quar- 
' ter of the sky, to construct the first comprehensive digital 
map of the universe in 3D. The large amount of spectro- 
scopic and photometric data obtained during the last years 
by SDSS, which has opened a new horizon for the study 
of galaxy properties such as galaxy evolution, clusters, red- 
shifts, large-scale distribution on morphological type and so 
on. However, photometric classification and redshift estima- 
tion is of prime importance for the SDSS project. Obtaining 
reliable object type and redshift estimation based on SDSS 
photometry is thus an extremely valuable adjunct to the 
spectroscopic sample. 

One of the first segregation discovered in galaxy clus- 
ters was the morphological one. The first evidences of such 
segregation date from Curtis (1918) and Hubble & Humason 
(1931), and was quantified by Oemler (1974) and Melnick 
& Sargent (1977). The problem of general automated clas- 
sification always lies in the difficulty of finding quantitative 
measures that strongly correlate with the Hubble sequence 
based on visual inspections. Shimasaku et al. (2001) and 
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Strateva et al. (2001) using SDSS data, showed that the ratio 
of Petrosian 50 percent light radius to Petrosian 90 percent 
light radius, Ci„, measured in the r-band image was a use- 
ful index for quantifying galaxy morphology. For early-type 
galaxies, concentration index Ci„ is larger than 2.5; while 
for late-type galaxies, dn is less than 2.5. Strateva et al. 
(2001) also found that the color u — r = —2.22 efficiently 
separates early- and late-type galaxies at 2 < 0.4. The basis 
for the classification of the SDSS photometric database can 
be provided by the objects whose nature is precisely known 
from spectroscopy. 

Photometric redshifts refer to the redshift estimation 
of galaxies using only medium- or broad-band photometry 
or imaging instead of spectroscopy. Techniques for deriv- 
ing redshifts from broadband photometry were pioneered 
by Baum (1962). Subsequent implementations of these ba- 
sic techniques have been made by Couch et al. (1983) and 
Koo (1985). In terms of data mining, the photometric red- 
shift estimation belongs to the regression task of data min- 
ing. In principal, the various approaches used for solving 
regression problem may be applied to the photometric red- 
shift measurement. So far there has been a great amount of 
research on the techniques of photometric redshift estima- 
tion. The techniques are broadly grouped into three kinds: 
the template-matching method, the empirical training-set 
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method, instance-based learning method. When using the 
template-matching method, we must have the template. The 
quality of template directly influences the performance of 
predicting photometric redshifts. The template spectra come 
from population synthesis models (e.g. Bruzual & Chariot 
1993) or from spectra of real objects (e.g. Coleman et al. 
1980). The empirical training-set method is based on the real 
data. So whether the real data is enough and complete is an 
important factor. The training-set method is usually imple- 
mented by train-test method or cross-validation method, in 
other words, it needs to train training-set to get a classifier 
or regressor and then the classifier or regressor is tested by 
test set. Typical empirical training-set methods include arti- 
ficial neural networks (ANNs, Collistcr & Lahav 2004; Firth, 
Lahav & Somerville 2003; Vanzella et al. 2004; Li et al. 
2007), support vector machines (SVMs, Wadadekar 2005; 
Wang et al. 2007, 2008), ensemble learning and Gaussian 
process regression (Way & Srivastava 2006), and linear and 
non-linear polynomial fitting (Brunner et al. 1997; Wang, 
BahcaU & Turner 1998; Budavdri et al. 2005; Hsieh et al. 
2005; Connolly et al. 1995). Although the instance-based 
learning method also relies on the real data, it is different 
from the training-set method for it has no training process 
and stores all data in the memory of computer. Examples of 
such techniques are k-nearest neighbours (e.g. Csabai et al. 
2003; BaU et al. 2007; Gao, Zhang & Zhao 2007), kernel 
regression (Wang et al. 2007, 2008), and locally weighted 
regression. 

This paper majors in morphological classification of 
galaxies using k-means algorithm and photometric red- 
shift estimation of galaxies using artificial neural networks 
(ANNs). The paper is organized as follows. Section 2 gives 
the scheme of this paper. Section 3 introduces a brief 
overview of k-means algorithm and artificial neural net- 
works, respectively. Section 4 describes photometric classifi- 
cation of galaxies using k-means algorithm. We investigate 
redshift estimation using an extensive series of tests in Sec- 
tion 5. The conclusions and discussions are summarized in 
Section 6. 



3 PRINCIPLE 

3.1 K-means Algorithm 

The k-means algorithm (MacQueen, 1967) is one of the 
simplest unsupervised learning algorithms used for cluster- 
ing problem. The algorithm clusters n objects based on at- 
tributes into k partitions, k < n. The main idea is to define 
k centroids, one for each cluster. These ccntroids should be 
placed in a cunning way because different locations cause 
different results. So, the better choice is to place them as 
much as possible far away from each other. The next step is 
to take each point belonging to a given data set and asso- 
ciate it to the nearest centroid. When no point is pending, 
the first step is completed and an early groupage is done. 
At this point we need to re-calculate k new centroids as 
barycenters of the clusters resulting from the previous step. 
After we have these k new centroids, a new binding has to 
be done between the same data set points and the nearest 
new centroid. A loop has been generated. As a result of this 
loop we may notice that the k centroids change their loca- 
tion step by step until no more changes are done. In other 
words centroids do not move any more. 

K-mcans algorithm is similar to the expectation- 
mciximization algorithm for mixtures of Gaussians in that 
they both attempt to find the centers of natural clusters in 
the data. It assumes that the object attributes form a vector 
space. The objective it tries to achieve is to minimize total 
intrar-cluster variance, or, the squared error function 

fc 

i=i xjeSi 

where there arc k clusters Si, i — 1,2, 3, k, and Hi is the 
centroid or mean point of all the points Xj G Si. 

Although it can be proved that the procedure will al- 
ways terminate, the k-mcans algorithm docs not necessarily 
find the most optimal configuration, corresponding to the 
global objective function minimum. The algorithm is also 
significantly sensitive to the initial randomly selected cluster 
centers. The k-means algorithm can be run multiple times 
to reduce this effect. 



2 THE SCHEME OF THIS PAPER 

This paper demonstrates the potential of bulk classification 
of the SDSS data and indicates a wide range of research 
applications, especially for redshift estimation. K-means al- 
gorithm offers an efficient way to identify the physical na- 
ture of SDSS sources, so it has a strong potential to become 
an important classification tool for the bulk of the SDSS 
photometric database. We collect the SDSS Data Release 5 
galaxy sample. Then k-means algorithm is applied on this 
sample for two respects: one is to preprocess the sample 
by removing outliers; another is to automatically separate 
preprocessed sample into two morphological classes (namely 
early- and late- type galaxies). After that, we consider two 
cases for photometric redshift estimation with different in- 
put patterns by artificial neural networks (ANNs). The first 
is directly to use ANNs on the total preprocessed sample, 
and the other is to employ ANNs on the early- and late-type 
galaxies respectively. Finally the results of the two cases are 
compared. 



3.2 Artificial Neural Networks 

An Artificial Neural Network (ANN) is an information pro- 
cessing paradigm that is inspired by the way biological ner- 
vous systems, such as the brain, process information. The 
key element of this paradigm is the novel structure of the in- 
formation processing system. ANNs arc collections of inter- 
connected neurons each capable of carrying out simple pro- 
cessing. Thus, they arc composed of massively paralk4 dis- 
tributed processors that have an inherent property of storing 
experiential knowledge and making it available for use. The 
knowledge is acquired by the network through a learning 
process and is stored in interneuron connection strengths - 
known as synaptic weights (Haykin 1994). Practical appli- 
cations of ANNs most often employ supervised learning. For 
supervised learning, one must provide training data that in- 
cludes both the input (a set of vectors of parameters, here 
each vector corresponds to a gala:xy) and the desired result 
or the target value (the corresponding redshifts). After the 
network is trained successfully, one can present input data 
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alone to the ANN (that is, input data without the desired 
result), and the ANN will compute an output value that ap- 
proximates the desired result. This is achieved by using a 
training algorithm to minimize the cost function which rep- 
resents the difference (error) between the actual and desired 
output. The cost function E is commonly of the form 

1 '' 

E = - J^iok -tk)\ (2) 

where Ok and tt are the output and target respectively for 
the objects, p is the sample size. Generally the topology 
of an ANN can be schematized as a set of A'' layers (see 
Fig. 1), with each layer composed of a number of neurons. 
The first layer (i = 1) is usually called the "input layer", the 
intermediate ones the "hidden layers" , and the last one {i = 
TV) the "output layer". Such a species of ANN is formally 
known as a "multilayer perceptron" (MLP). Each neuron j 
in the s layer derives a weighted sum of the M output z"^^ 
from the previous layer (s — 1) and, through either a linear 
or a non-linear function, produces an output, 

M 

-i^'=/(E»^^'^"")- (3) 

i=0 

Here Wjo denotes the bias for the hidden unit j, and / is 
an activation function such as the continuous sigmoid or, as 
used here, the tanh function, which has an output range of 
-1 to 1: 

/(-) - ~ 1- (4) 

When the entire network has been executed, the output of 
the last layer is taken as the output of the entire network. 
The free parameters of ANNs are the weight vectors. Dur- 
ing the training session, the weights of the connections are 
adjusted so as to minimize the total error function. The 
learning procedure is the so-called "back propagation" . The 
number of layers, the number of neurons in each layer, and 
the functions are chosen from the beginning and specify the 
so called "architecture" of the ANN. Neural networks learn 
by examples. The neural network user gathers representa- 
tive data into a training set and initiates the weight vector 
with a random seed, then invokes the training algorithms to 
automatically learn the structure of the data. Here, we use 
a method that is popular in neural network research: the 
Levenberg-Marquardt method (Levenberg 1944; Marquardt 
1963; also detailed in Bishop 1995). This has the advantage 
that it converges very quickly to a minimum of the error 
function. This error function may not have just a global 
minimum in the multidimensional weight space but could 
have a number of local minima instead. In general, network 
trained using exactly the same training set for the same 
given number of epochs but using different initial weights 
( different starting points in this space) will converge to 
slightly different final weights. In order to avoid (possible) 
over-fitting during the training, another part of the data 
can be reserved as a validation set (independent both of 
the training and test sets, so not used in the updating of 
the weights), and used during the training to monitor the 
generalization error. After a chosen number of training iter- 
ations, the training terminates and the final weights chosen 
for the ANN are those from the iteration at which the cost 
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Figure 1. Schematic diagram of an ANN 



function is minimal on the validation set. This is useful to 
avoid over-fitting to the training set when the training set 
is small, but the disadvantage of this technique is that it 
reduces the amount of data available for both training and 
validation, which is particularly undesirable if the data set 
is small to begin with. 

An ANN is configured for a specific application, such as 
pattern recognition or data classification, through a learning 
process. ANNs have various popular applications in astron- 
omy, for example, star/galaxy separation (e.g. Odewahn & 
Nielsen 1994; Bertin & Arnouts 1996), morphological classi- 
fication of galaxies (Nielsen & Odewahn 1994; Lahav et al. 
1996; Ball et al. 2004), spectral classification (Folkes et al. 
1996; Weaver 2000) and astronomical objects classification 
(Zhang & Zhao 2004, 2007), photometric redshift estima- 
tion (e.g.. Firth et al. 2003; Vanzella et al. 2004; Li et al. 
2007; D'Abrusco et al. 2007). As for a review of ANNs 
applied in astronomy, refer to Serra-Ricart et al. (1993), 
Miller (1993), Storrie-Lombardi & Lahav (1994) and Li et al. 
(2006). Bailer- Jones (1996, 2000) also majored in this issue. 



4 MORPHOLOGY CLASSIFICATION 
4.1 Chosen Galaxy Sample 

The Sloan Digital Sky Survey (SDSS) is the most ambitious 
astronomical survey ever undertaken. The SDSS uses a ded- 
icated, 2.5-meter telescope on Apache Point, New Mexico, 
equipped with two powerful special-purpose instruments. 
The SDSS completed its first phase of operations - SDSS-I 
- in June, 2005. Over the course of five years, SDSS-I im- 
aged more than 8,000 square degrees of the sky in five band- 
passes, detecting nearly 200 million celestial objects, and 
it measured spectra of more than 675,000 galaxies, 90,000 
quasars, and 185,000 stars. These data have supported stud- 
ies ranging from asteroids and nearby stars to the large 
scale structure of the Universe. The SDSS has entered a new 
phase, SDSS-II, continuing through June, 2008. SDSS-II wiU 
carry out three distinct surveys - the Sloan Legacy Survey, 
SEGUE, and the Sloan Supernova Survey - to address fun- 
damental questions about the nature of the Universe, the 
origin of galaxies and quasars, and the formation and evo- 
lution of our own Galaxy, the Milky Way. 

We downloaded 582,512 galaxies from the SDSS DR5 
database, only took objects with available five-band pho- 
tometries. By removing the records with default values, we 
obtained 582,257 galaxies. 
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4.2 K-means Algorithm for Morphology 
Classification 

The origin of the morphology of galaxies is a longstanding 
issue that could provide a key to discerning among models of 
the formation of galaxies. Perhaps there is a general correla- 
tion between galaxy color and Hubble morphologies, Strat- 
eva et al. (2001) had demonstrated that the galaxy color 
u — r IS. related with the morphology of galaxies and grouped 
galaxies into two families. In this section we also used the 
galaxy color index to classify the galaxies with k-means al- 
gorithm. 

As we known, k-means algorithm is an unsupervised ap- 
proach, and it can automatically cluster based on the intrin- 
sic property of objects. Here we use this approach to cluster 
the above given sample into two galaxy types (namely early- 
type and late- type galaxies). For this experiment, we used 
the five Petrosian color index (u-g, g-r, r-i, i-z, u-r) as the 
input parameters of k-means approach. The algorithm pro- 
gramme may analyze the color property of the whole sample 
and separate the given database into two classes automati- 
cally. As a result, the number of each family is 300,903 and 
281,354, respectively. 

In order to verify the type of each class, we randomly 
select 1000 records from each family respectively to confirm 
their types, which is achieved by using their position corre- 
sponding to the given one in the NASA/IPAC Extragalac- 
tic Database (NED). By consulting, we found that the clus- 
ter membership with 300,903 records are early-type galaxies 
and the other 281,354 records belong to late-type galaxies. 
In Fig. 2 and Fig. 3, we give the u — r histogram for indi- 
vidual subclass (early- and late- type galaxies), respectively. 
The u — r histogram of the total galaxy sample is shown 
in Fig. 4. The g — r versus u — r diagram is displayed in 
Fig. 5, where the samples with red points are early-type 
galaxies and the ones with black points are late-type galax- 
ies. In Fig. 5, the line is the u — r — 2.22 plane. According 
to the u — r cut (Strateva et al. 2001 ), the galaxies with 
u — r > 2.22 belong to early-type galaxies, while those with 
u — r< 2.22 belong to late-type galaxies. However, by means 
of k-means method for classification, the u — r value for 
early-type galaxies lies in the range from 2.0 to 8.0, that for 
late-type ones in the range from to 5.0. Therefore the clas- 
sification results by the two methods show difference. Only 
by the u — r cut for morphological separation could there be 
degeneracies. For example, some low redshift dusty edge-on 
spirals could easily be misclassified as early-type galaxies. As 
a result, the classification result of k-means method is more 
reasonable than the u — r cut because this algorithm applies 
more information and classify objects in a multi-parameter 
space. K-means algorithm has obvious strength that it is 
automatically clustering by the properties of galaxies in the 
real universe and require no additional assumptions about 
their formation and evolution. 



5 REDSHIFT ESTIMATION 
5.1 The Used Sample 

Before the experiment of photometric redshift prediction, we 
selected the objects from the total galaxy sample satisfying 
the following criteria (also see Vanzella et al. 2004). The ob- 
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Figure 2. The u — r histogram for early-type galaxies. 
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Figure 3. The u — r histogram for late-type galaxies. 



tained galaxy sample consists of galaxies with r-band Pet- 
rosian magnitude brighter than 17.77; the spectroscopic red- 
shift confidence must be greater than 0.95 and there must be 
no warning flags. According to the restriction, we obtained 
the sample containing 375,929 galaxies from 582,257 data 
sets described in Section 4. Whereas 191,200 of the sam- 
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Figure 4. The u — r histogram for the total galaxy samples. 
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Figure 5. The g — r versus u — r diagram, red points represent 
early-type galaxies and black points represent late-type galaxies. 



pie belong to early-type galaxies and 184,729 to late-type 
galaxies. The Galactic absorption in the different filters was 
obtained from the dust maps of Schlegel et al. (1998). 
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Figure 6. The comparison of predicted photometric reshifts with 
spectroscopic redshifts for early-type galaxies. 



5.2 Result 



ANNs are used to predict photometric redshifts for the se- 
lected galaxy samples. From the total sample, we randomly 
selected 150,000 for training, 50,000 for validation and the 
rest 175,929 as test sample. By training, the regressor is ob- 
tained, then it can be used to predict photometric redshifts 
of the test sample. The root-mean-square (rms) redshift er- 
ror is represented as a^. 

Similarly, for the early-type galaxy sample, we ran- 
domly partitioned them into 80,000 for training, 20,000 for 
validation and 91,200 for testing, respectively. The late-type 
galaxy sample was also separated into training, validation 
and test sets with respective sizes 80,000, 20,000 and 84,729. 
We also applied ANNs to predict the photometric redshifts 
of the two subclasses, respectively. Then calculating their 
mixed accuracy is as follows: 
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where zNNi is the neural output, zspeci is the target, A''i 
is the test sample number of early-type galaxies, and A^2 
is the test sample number of late-type galaxies. Finally, we 
compared the mixed accuracy with that of the total galaxy 
sample alone. 

The experimental results with different input parame- 
ters and different samples by means of different ANN struc- 
tures are given in Table 1. For different samples, the ap- 
plied ANN structures are different. When applying ANNs 
on the actual photometric target sample, the whole proce- 



dure should be run several times with the test set by modify- 
ing the parameters of training (e.g., weight decay, the num- 
ber of hidden layers) in order to optimize the performance. 
As shown in Table 1., it is evident that the results based 
on model magnitudes are better than those based on Pet- 
rosian magnitudes, while those based on dereddened magni- 
tudes are superior to those based on model magnitudes. We 
also find that the performance improves when adding the 
parameters {PetR50, PetR90). For various situations, the 
rms scatter of photometric redshifts for early-type galaxies 
shows better performance than that for late-type galajcies. 
Comparing with the rms scatter of the total galaxy sam- 
ple, the mixed scatter of photometric redshift estimation 
improves when dividing galaxies into early-type ones and 
late- type ones. This conclusion is similar to that of Wang 
et al. (2007). Especially for early-type galaxies, the result 
is rather better than that of late-type galaxies. For exam- 
ple, when taking dereddened u, g, r, i, z, PetR50, PetRQO as 
input pattern, the rms deviation of photometric redshifts 
for early-type galaxies adds up to 0.0164, that for late-type 
galaxies is 0.0217, that for the mixed sample amounts to 
0.0192, and that for the total sample is 0.0196. In order to 
see the results clearly, the comparisons of predicted pho- 
tometric reshifts with spectroscopic redshifts for early-type 
galaxies, late- type galaxies and the overall galaxies are plot- 
ted in Figs 6-8. These figures indicate that the experimen- 
tal results for early-type and the overall galaxies show very 
well performance, although the result of late-type galaxies 
is not good. The correlation coefficient R further proves the 
conclusion, R is 0.0952 for early-type galaxies, 0.0881 for 
late-type galaxies and 0.934 for the overall galaxies. That 
early-type galaxies have more better accuracy of photomet- 
ric redshifts than late-type ones may be due to the fact that 
the spectra of early-type galaxies show a more prominent 
break at 4000A and therefore a better photo-z signal. 
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Table 1. Photometric redshift prediction with artificial neural networks 
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Figure 7. The comparison of predicted photometric reshifts with 
spectroscopic redshifts for late-type galaxies. 



Figure 8. The comparison of predicted photometric reshifts with 
spectroscopic redshifts for the overall galaxies 



6 CONCLUSIONS AND DISCUSSIONS 

Firstly we employed an unsupervised method, k-means al- 
gorithm to subdivide the total galaxy sample into two sub- 
classes. The objects are clustered by its intrinsic properties. 
By consulting the NED database, we know that the two sub- 
classes belong to early-type galaxies and late-type galaxies, 
respectively. Based on the total sample and the two sub- 
samples, we have made various experiments with different 
parameters for photometric redshift estimation by means of 
ANNs. Experimental results indicate that no matter employ- 
ing Petrosian magnitudes, model magnitudes or dereddened 
magnitudes, the more parameters are considered, the higher 
the accuracy is. When the parameters are added in the train- 



ing data, there will be more information for the network to 
improve its capability of prediction and generalization, so 
the final accuracy also improves correspondingly. This result 
is consistent with the work (Li et al. 2007). This is a typ- 
ical characteristics of ANNs, which can be trained directly 
on problems with hundreds or thousands of inputs. Whereas 
for kernel regression and support vector machines (SVMs), 
the optimal choice of input pattern is necessary (Wang et al. 
2007, 2008). Table 1 shows that the accuracy of photomet- 
ric redshifts with the mixed sample outperforms that with 
the overall sample in different situations. The best experi- 
mental result is that the prediction accuracy is CTz = 0.0196 
for the overall sample, (Tmix = 0.0192 for the mixed sample. 
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feariy = 0.0164 for early-type galaxies, uiate = 0.0217 for 
late-type galaxies. 

Up to now, there are many efforts on photometric red- 
shifts with ANNs. Because different work is based on dif- 
ferent samples, different attributes and different architec- 
tures, we only give a rough comparison of ANNs which are 
applied for photometric redshift estimation in different ref- 
erences, as shown in Table 2. Comparing with the former 
work (see Table 2), the scheme that we estimate photomet- 
ric redshifts after classifying galaxies into carly-typc ones 
and late-type ones is applicable and satisfactory, moreover 
the scheme helps to study galaxies in detail and improve 
the cfScicncy of photometric redshift estimation. The im- 
provement in accuracy of photometric redshift estimation is 
of great importance to the study of large-scale structure of 
the universe as well as the formation and evolution of galax- 
ies. When the quality and quantity of observational data 
increases, more and more parameters arc available to this 
problem. Moreover, ANNs will show its superiority in tack- 
ling this complex situation and have wide application (i.e. 
classification, regression, feature selection) in astronomy. In 
addition, unsupervised approaches don't require human to 
have the foreknowledge of the classes, and mainly using some 
clustering algorithm to classify data. These procedures can 
be used to determine the number and location of the uni- 
modal classes and helpful for astronomers to find unusual or 
unknown objects or phenomenon. 
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